Genomic Insights Into the Archaea Inhabiting an Australian Radioactive Legacy Site

During the 1960s, small quantities of radioactive materials were co-disposed with chemical waste at the Little Forest Legacy Site (LFLS, Sydney, Australia). The microbial function and population dynamics in a waste trench during a rainfall event have been previously investigated revealing a broad abundance of candidate and potentially undescribed taxa in this iron-rich, radionuclide-contaminated environment. Applying genome-based metagenomic methods, we recovered 37 refined archaeal MAGs, mainly from undescribed DPANN Archaea lineages without standing in nomenclature and ‘Candidatus Methanoperedenaceae’ (ANME-2D). Within the undescribed DPANN, the newly proposed orders ‘Ca. Gugararchaeales’, ‘Ca. Burarchaeales’ and ‘Ca. Anstonellales’, constitute distinct lineages with a more comprehensive central metabolism and anabolic capabilities within the ‘Ca. Micrarchaeota’ phylum compared to most other DPANN. The analysis of new and extant ‘Ca. Methanoperedens spp.’ MAGs suggests metal ions as the ancestral electron acceptors during the anaerobic oxidation of methane while the respiration of nitrate/nitrite via molybdopterin oxidoreductases would have been a secondary acquisition. The presence of genes for the biosynthesis of polyhydroxyalkanoates in most ‘Ca. Methanoperedens’ also appears to be a widespread characteristic of the genus for carbon accumulation. This work expands our knowledge about the roles of the Archaea at the LFLS, especially, DPANN Archaea and ‘Ca. Methanoperedens’, while exploring their diversity, uniqueness, potential role in elemental cycling, and evolutionary history.


INTRODUCTION
The Little Forest Legacy Site (LFLS) is a low-level radioactive waste disposal site in Australia that was active between 1960 and 1968. During this period, small quantities of both short and long-lived radionuclides (including plutonium and americium) were disposed of in threemetre deep, unlined trenches, and covered with a shallow (1 m layer) of local soil, as was common practice at the time (Payne, 2012). The location of the LFLS site was partly chosen because of its clay and shale rich stratigraphy, intended to limit the migration of radionuclides from site. An unintended consequence of this, is that surface water infiltrates into the more porous trench waste form at a greater rate when compared to the surrounding geology, and during intense and prolonged rainfall events the contaminated water levels in trenches fill up, akin to a 'bathtub' (Payne et al., 2013(Payne et al., , 2020. Aside from providing a mechanism enabling the export of contaminants, occurring when trench waters (infrequently) overflow out of the 'bathtub' to the surrounding environment (Payne et al., 2013), the periodic influx of oxic waters into these natural reducing zones results in transitory microbial population shifts associated with pronounced elemental cycling (Vázquez-Campos et al., 2017;Kinsela et al., 2021). Our previous research showed that the archaeal community in the LFLS waste trenches, while constituting a minor portion of the whole microbial community, included a number of potentially interesting members, either in terms of phylogeny and/or functionality (Vázquez-Campos et al., 2017).
The domain Archaea constitutes by far the most understudied domain of life. Archaea have been traditionally regarded as biological oddities and marginal contributors to the global geochemical elemental cycles, with most living in extreme environments (Dombrowski et al., 2019). However, the continuous development of cultivation-independent techniques in recent decades has completely changed this perception. For example, "Thaumarchaeota" are now known to be key components in the global nitrogen cycle (Kitzinger et al., 2019) and can even induce systemic resistance to pathogens in plants (Song et al., 2019). Some Euryarchaeota, which are capable of the anaerobic oxidation of methane (AOM), consume up to 80-90% of the methane produced in environments such as rice fields and ocean floors before this highly deleterious greenhouse gas can be released into the atmosphere (Reeburgh, 2007).
The DPANN Archaea, so-named from its constituent lineages when it was defined ('Candidatus Diapherotrites, ' 'Ca. Parvarchaeota, ' 'Ca. Aenigmarchaeota, ' 'Ca. Nanohaloarchaeota, ' and "Nanoarchaeota") (Rinke et al., 2013), is an archaeal superphylum generally characterised by the small dimensions of its members (excluding 'Ca. Altiarchaeota'). Their diminutiveness not only relates to their reduced physical size, many of which are capable of passing through a standard 0.2 µm filter (Luef et al., 2015), but also due to their small genome sizes, i.e., many are below 1 Mbp and with an estimated maxima of ∼1.5 Mbp. Many of the DPANN with extremely reduced genomes lack biosynthetic pathways for amino acids, nucleotides, vitamins, and cofactors, which aligns with a hostdependent lifestyle (Dombrowski et al., 2019). Others, towards the higher end of the genome size scale, typically encode for some or most of the aforementioned pathways and are regarded as free-living. Irrespective of size, only a limited number of organisms from this lineage have been studied in detail (Waters et al., 2003;Baker et al., 2006;Castelle et al., 2015;Youssef et al., 2015;Golyshina et al., 2017).
Here we describe the results from the analysis of 37 new archaeal metagenome-assembled genomes (MAGs) with different degrees of completeness, derived from samples collected in a legacy radioactive waste trench at the LFLS during a redox cycling event. We propose 23 new Candidatus taxa (Murray and Schleifer, 1994;Murray and Stackebrandt, 1995) based on data analysed in this manuscript, as well as four missing taxa definitions. We present evidence of four undescribed DPANN lineages and six non-conspecific 'Ca. Methanoperedens spp.' genomes, while exploring their uniqueness and potential role in elemental cycling.

MATERIALS AND METHODS
In order to better understand the archaeal contributions in LFLS groundwaters and their influence upon site biogeochemistry, raw sequencing reads from ENA project PRJEB14718 (Vázquez-Campos et al., 2017) were reanalysed using genome-based metagenomic methodologies. Briefly, samples were collected in triplicate over a period of 47 days at four time-points, i.e., 0 (SAMEA4074280-2), 4 (SAMEA4074283-5), 21 (SAMEA4074286-8), and 47 (SAMEA4074289-91) days, after an intense rainfall event that filled the trench. Thorough chemical and radiochemical analyses were conducted on the samples in order to fully characterise the geochemistry (Vázquez-Campos et al., 2017). Over this time period, the redox conditions transitioned from slightly oxic to increasingly reducing/anoxic, reflected in both the chemical analyses and the community profile, with a clear increase in obligate anaerobes on days 21 and 47 (Vázquez-Campos et al., 2017).
DNA sequencing was performed in a NextSeq 500 (Illumina) with a 2 × 150-bp high-output run.
Contigs > 2.5 kbp were binned using CONCOCT (Alneberg et al., 2014) while limiting the number of clusters to 400 to avoid 'fragmentation errors, ' within anvi'o v2.3 (Eren et al., 2015) following the metagenomics protocol as previously described (Eren et al., 2015). Bins of archaeal origin were manually refined in anvi'o. Completeness and redundancy were estimated with anvi'o single copy gene (SCG) collection, based on Rinke et al. (2013), as well as with CheckM v1.0.13 (Parks et al., 2015). In the case of CheckM, analyses were performed by both the standard parameters, with automatic detection of the marker gene set, and by forcing the universal Archaea marker gene collection.

Phylogenomics and Phylogenetics
Prodigal v2.6.3 (-p meta) was used to predict proteins from the archaeal MAGs and 242 archaeal reference genomes (Supplementary Table 1). Predicted proteins were assigned to archaeal Clusters of Orthologous Groups (arCOGs) using the arNOG database of eggNOG v4.5.1 (Huerta-Cepas et al., 2016) using eggNOG-mapper v0.99.2 (Huerta-Cepas et al., 2017. Markers for the archaeal phylogenomic tree (rp44) were selected from the universal and archaeal ribosomal protein list from Yutin et al. (2012) with the following criteria: (i) present in a minimum of 80% of the collection of reference genomes and (ii) have a limited level of duplications (standard deviation of copy number in genomes with given arCOG < 1.25). A total of 44 ribosomal proteins were selected (Supplementary Table 2).
The concatenated archaeal protein tree was constructed using the posterior mean site frequency (PMSF) model  in IQ-TREE v1.5.5 (Nguyen et al., 2015), a faster and less resource-intensive approximation to the C10-C60 mixture models (Le et al., 2008), which are also a variant of Phylobayes' CAT model. Briefly, a base tree was inferred using a simpler substitution model such as LG. This tree was used as the base to infer the final tree under the optimum model suggested by IQ-TREE, LG+C60+F+I+G, with 1000 ultrafast bootstrap replicates (Minh et al., 2013).
Taxonomic identities of the recovered MAGs were initially also evaluated with GTDB-Tk v0.3.0  using GTDB r89 . The program was run with both classify_wf and then with de_novo_wf in order to attain a better placement for the MAGs with a limited number of closely related references (Supplementary Table 3).

Phylogenetic Placement of New Taxa in the DPANN Tree
In addition to the rp44 and GTDB-tk analyses, phylogenetic placement of the LFLS DPANN was investigated with additional methods based on a dataset containing all the representative genomes of DPANN from GTDB r202 (312, not including the LFLS MAGs already in r202), the 25 LFLS DPANN MAGs, and 11 non co-generic representatives from 'Ca. Altiarchaeota' as outgroup.
Three methods were used for the DPANN-specific phylogenomic analysis: (i) concatenated proteins; (ii) partitioned analysis; and (iii) multispecies coalescence (MSC) model with ASTRAL (Zhang et al., 2018). For these, the protein sequences used for tree inference were derived from the 122 SCGs used for the archaeal GTDB (ar122) with certain caveats, as 29 proteins were present in <50% of the full dataset (348 genomes) and therefore automatically eliminated from the concatenated and partitioned trees. In all cases, each marker was aligned with MAFFT-L-INS-i v7.407 (Yamada et al., 2016). i) Concatenated protein (dpann-cat): a concatenated multiple sequence alignment (MSA) was generated as for the rp44 dataset using MAFFT and BMGE. The tree was generated using the PSMF model. Briefly, an initial tree was generated with FastTree v2.1.10 (-lg -gamma) (Price et al., 2010). The best model was searched with IQ-TREE v2.1.2 for all supported mixture models plus iterations of LG with C10-C60 models (Le et al., 2008). The final tree was created with model LG+C60+F+I+G and 1000 ultrafast bootstrap replicates (Hoang et al., 2018). ii) Partitioned analysis (dpann-part): the MSA derived from the concatenated protein tree was used as input where each partition was defined as per each protein. A total of 93 partitions with a length of 49-733 amino acids were included in the analysis. Substitution models were selected individually per partition. Support values in IQ-TREE were calculated based on 1000 ultrafast bootstrap replicates with nearest neighbour interchange optimisation and --sampling GENESITE. iii) Supertree with ASTRAL (dpann-astral): sequences for each marker were aligned with MAFFT-L-INS-i and trimmed with BMGE v1.2 with options defined above. Individual ML trees were generated for each marker with IQ-TREE v2.1.2 and the suggested model for each, with 1000 ultrafast bootstrap replicates with nearest neighbour interchange optimisation (Hoang et al., 2018). The resulting ML trees were used as input for ASTRAL-III v5.7.7 (Zhang et al., 2018) with multi-locus bootstrapping (-b -r 1000).
Prior iterations of these analyses performed with the DPANN representatives from GTDB r89 (dpann_r89, 150 reference genomes and 10 'Ca. Altiarchaeota' outgroup sequences) and dpann_r89 plus 4 representatives from 'Ca. Fermentimicrarchaeales' (dpann_r89F) were also included as part of the discussion on the placement of some of the LFWA lineages.
Key biogeochemical enzymes were identified based on signature hmm profiles in the InterProScan output or by custom hmm profiles not integrated in databases Dombrowski et al., 2017). Custom hmm profiles were searched with HMMER v3.1b2 (-E 1e-20) (Eddy, 2011). Proteins with single HMMER matches below previously established thresholds Dombrowski et al., 2017) were searched against Swissprot and checked against the InterProScan results to reduce the incidence of false negatives.
High-heme cytochromes (≥10 binding sites per protein) were predicted using a modified motif_search.py script 2 that uses CX(1,4)CH as well as the more canonical CXXCH hemebinding motif.
Prediction of tRNA and rRNA genes was performed with tRNAscan-SE 2.0 (Chan et al., 2019) and Barrnap v0.9 3 respectively. Total tRNAs are reported based on the 20 default aminoacyl-tRNAs plus the initiator tRNA (iMet).

Multivariate Analysis of DPANN Genomes
DPANN MAGs from LFLS were compared with extant publicly available genomes based on functional and compositional characteristics. In addition to the genomes in the rp44 dataset, all (165) DPANN published genomes in IMG were downloaded (November 15, 2019). Genomes from rp44 and IMG were compared with MASH v2.2.2 (Ondov et al., 2016) to remove redundant genomes, resulting in 133 additional DPANN genomes to be added to the dataset (Supplementary Table 4).
Functional data included in the analysis included COG categories, PSortb predictions, TCDB, MEROPS and CAZy main categories as percentage of the total proteins predicted in each genome. Compositional data included the proportion of each amino acid in the predicted proteome, GC content and coding density. Before analysis, all reference genomes with %C < 70% (based on CheckM Archaea) were removed.
Multivariate analysis was performed on zero-centred scaled data via Sparse Principal Component Analysis (sPCA) (Shen and Huang, 2008) as implemented in the MixOmics R package v6.13.21 (Rohart et al., 2017) and limiting the number of explanatory variables per component to five.

Completeness of DPANN Metagenome-Assembled Genomes
Due to the limited completeness values obtained for DPANN genomes -even those in the literature reported to be assembled in single contigs (Supplementary Table 5), four different archaeal or prokaryotic SCG collections were examined to assess each of the constituent markers with regard to their suitability for being included as markers for reporting completeness of DPANN MAGs. In addition to the already calculated completeness based on 162 SCGs) (Eren et al., 2021) and CheckM Archaea (149 SCGs) (Parks et al., 2015), two additional SCG libraries were utilised: anvi'o Archaea76 (76 SCG -current archaeal SCG in anvi'o) (Lee, 2019), and CheckM prokaryote (56 SCGs, forced prokaryote/root SCG collections from CheckM) (Parks et al., 2015).
An initial set of DPANN genomes based on the one used for the multivariate analysis were filtered at 50%C (193 genomes) and 75%C (101 genomes) based on completeness values obtained by CheckM Archaea. All individual markers in all SCG collections were evaluated for their prevalence in each set of genomes. Redundant markers across SCG collections were also compared in order to filter potential universal SCGs that might be represented by best/flawed HMM profiles. Initial marker filtering required a prevalence of ≥75% across all DPANN genomes, and a prevalence within classic DPANN lineages of >50% to account for the limited number of highly complete genomes.

Proposed Taxa
Details on etymology and nomenclature can be found in the Taxonomic Appendix. A summary of the proposed taxa can be found in Table 1. Additional details about the proposed species can be found in Supplementary Table 6.
Minimal quality criteria for denominating new candidate species was based on Chuvochina et al. (2019). Relaxed parameters were allowed for selected genomes when phylogenetic/taxonomic novelty was deemed of special relevance: completeness ≥ 90%, redundancy < 10%, ≥18 tRNA, and 16S rRNA gene sequence ≥ 1000 bp. MAGs passing those criteria, different to extant named species (ANI < 95%) and with no described higher taxonomic levels, were considered to describe new taxa.
Relative evolutionary divergence (RED) (Parks et al., 2018) values were generated with phylorank v0.1.10 6 for the dpann-cat tree based on the taxonomy levels from GTDB r202. Obtained RED values were used to circumscribe the limits of new and extant lineages from genus to class level. For lower taxonomic levels, AAI and ANI calculations were also used (Konstantinidis et al., 2017). AAI was used to evaluate delimitations between families, genera, and, to a lesser extent, species. ANI was only utilised for genus and species levels.

LFW-283_2 b
Descriptions of the taxa in this table are included in the Taxonomic Appendix with additional details on the proposed type materials in Supplementary Table 6. a new proposal to fill gaps in taxonomic ranks. b obtained in this study.

Binning Results and Archaeal Community
The sequencing output totalled 236,581,129 paired-end reads and 71.1 Gbp of data across the whole dataset. The de novo coassembly of sequence reads from the LFLS trench subsurface water samples (Vázquez-Campos et al., 2017) generated 187,416 contigs with a total length of 1.32 Gbp (based on 2.5 kbp cutoff).
Binning with CONCOCT generated 290 initial bins. A total of 21 bins with clear archaeal identity or with ambiguous identity (similar completeness scores for bacterial and archaeal SCG profiles) were further refined with anvi'o, producing 37 curated archaeal MAGs with completeness ≥50% (%C from here on; 22 of ≥90%C) and redundancy ≤10% (%R from here on) ( Table 2 and  Supplementary Table 8 for detailed completeness assessment). Initial phylogenomic reconstruction was based on the 16 ribosomal proteins by Hug et al. (2016) with the exception of rpL16 due to frequent misannotation (Laura Hug, pers. comm.) (data not shown) using the Phylosift HMM profiles (Darling et al., 2014). The resulting trees showed relatively poorly resolved basal branches in addition to the absence of rpL22 genes from virtually all DPANN Archaea. We found this to also be the case with other known HMM profile-based databases such as TIGRFAM (Haft et al., 2013). As an example, the rpL14p in DPANN Archaea is often overlooked by the TIGR03673 even when the protein is present as per the more suitable model defined by arCOG04095. Subsequently, we used the more up to date arNOG and increased the number of phylogenetic markers to 44 universal and archaeal ribosomal proteins. Phylogeny based on the 44 concatenated ribosomal proteins (rp44, Figure 1) showed that most metagenome assembled genomes (MAGs) from LFLS belonged to diverse DPANN lineages and Methanomicrobia ('Ca. Methanoperedenaceae' and Methanotrichaceae) ( Table 2). Phylogenomic analysis based on the rp44 (Figure 1) showed a general tree topology largely consistent with current studies, e.g., 'Ca. Asgardarchaeota' as sister lineage to TACK and 'Ca. Altiarchaeota' as sister to DPANN (Zaremba-Niedzwiedzka et al., 2017;Dombrowski et al., 2019). Most high level branches showed UF bootstrap support values >90%, with the exception of the very basal Euryarchaeota, which is known to be difficult to resolve (Adam et al., 2017). The position of the MAGs in the archaeal phylogeny suggests four undescribed lineages (Figure 1) within DPANN, denoted LFWA-I to -IV. These lineages correspond to representatives of high-level taxa (order or above) that do not have either a current proposed name or have not been explored in detail in the literature. As a means of honouring the Australian Aboriginal community, and very especially, the traditional owners of the land where LFLS is located, many of the nomenclatural novelties proposed are based on terms from Aboriginal languages related to the site (mainly Dharawal), see Taxonomic Appendix. A summary of the nomenclatural proposals derived from this work are summarised in Table 1.
During the drafting of this manuscript, a genome-based taxonomy was developed for Bacteria and Archaea (GTDB) (Parks et al., 2018). For all proposed lineages, we have included the possible correspondence with the GTDB taxonomy (r202) wherever necessary using the GTDB nomenclature with prefixes, e.g., "p__" for phylum, to avoid confusion with more "typically" defined lineages with the same names. It should be noted that due to the delay between the release of the pre-print and MAGs data, and the publication of this manuscript, 27 of the 37 MAGs from this work were included in the GTDB r202 as representatives.

The Archaeal Community in Detail
Based on bin coverage information, the archaeal community was dominated by DPANN, esp. 'Ca. Pacearchaeota' and LFWA-III (see below), and Euryarchaeota, esp. 'Ca. Methanoperedenaceae, ' with maximum relative abundance values of 79.7% at day 4, and 42.8% at day 47, respectively (Supplementary Table 9). Our previous analysis indicated that DPANN constituted a maximum of 55.8% of the archaeal community (Vázquez-Campos et al., 2017). This discrepancy could be an artefact derived from the different copy numbers of rRNA gene clusters, often limited to one in DPANN and TACK, compared to other Archaea with more typically sized genomes containing multiple copies of the operon.
Based on the functional annotation of key proteins of biogeochemical relevance (Figure 2 and Supplementary Table 10), the archaeal MAGs recovered from LFLS do not appear to play a major role in the sulfur cycle (lacking dissimilatory pathways, Supplementary Table 10). Regarding the nitrogen cycle, the presence of nitrogenases in some of the 'Ca. Methanoperedens spp.' MAGs (Figure 2 and Supplementary Table 10) suggest that they may play a larger role in the fixation of N 2 than in the dissimilatory reduction of nitrate, particularly given the negligible nitrate concentrations measured (Vázquez-Campos et al., 2017). Another Archaea likely to be heavily involved in the nitrogen cycle, but through the degradation of proteins and D-amino acids, is the Thermoplasmata LFW-68_2, based on the unusually high number of proteases, and specific proteases and transporters encoded in its genome (  Table 11; see Supplementary Information for details). The D-amino acids are one of the most recalcitrant components of necromass and are often found in bacterial cell walls as well as enriched in aged sediments as a result of amino acid racemisation (Lomstein et al., 2012;Lloyd et al., 2013). Thus, Thermoplasmata LFW-68_2 may have a role in the remobilisation and remineralisation of organic matter in sediments.
Due to their special interest and relevance, the content below is focused on the new DPANN and the 'Ca. Methanoperedens spp.' from LFLS. Aspects related to other MAGs and other general observations regarding the whole archaeal community are contained within the Supplementary Materials.

LFWA-I: Order 'Ca. Tiddalikarchaeales'
The LFWA-I lineage is a sister lineage to 'Ca. Parvarchaeota' (ARMAN-4 and -5) (Baker et al., 2006;Rinke et al., 2013) in rp44 and contains a unique MAG, LFW-252_1 ('Ca. Tiddalikarchaeum anstoanum') ( Figure 1 and Supplementary Figure 2). LFWA-I corresponds to the order and family o__CG07-land f__CG07-land within the c__Nanoarchaeia as per GTDB. As with many other DPANN from the PANN branch (such as cluster 2 in Dombrowski et al., 2020) %C estimates with existing Archaea SCG collections are severely underestimated with barely 79.0%C reported by CheckM (best). The%C combined with the assembly size sets a complete genome size >1.5 Mbp, well above the expected genome size for a "Nanoarchaeota", and similar to that for many PANN MAGs (Supplementary Table 8). A dedicated SCG collection was built from extant HMM profiles by removing SCGs normally absent from all DPANN and/or from specific sub-lineages. This first attempt of a tailored SCG library for assessing the completeness of DPANN genomes (Supplementary Table 12), provides more sensible results not only for LFW-252_1 (95.7%C/1.1%R) but also for DPANN assemblies in general, including some well-studied references (Supplementary Table 5).
The LFW-252_1 lacks the biosynthetic capacity for almost all amino acids, and for the de novo enzymes biosynthesising FIGURE 1 | Phylogenomic analysis of the archaeal MAGs found at LFLS. Concatenated protein tree constructed with 44 universal and Archaea-specific ribosomal proteins from 230 reference genomes and 36 original MAGs from this work. Tree was constructed with IQ-TREE under the PMSF + LG + C60 + F + I + G. Lineages with representatives at LFLS are shown coloured and uncollapsed, with labels in blue. Inner ring indicates the major archaeal lineages. Shortened branches are shown at 50% of their length. Black circles indicate ultrafast bootstrap support values > 90%.
purine and pyrimidine, despite having a normal enzymatic representation for interconversion of purine nucleotides, and interconversion of pyrimidine nucleotides (Supplementary Figure 3). With the additional lack of a proper pentose biosynthesis pathway and only containing the last enzyme required for the biosynthesis of 5-phospho-α-D-ribose 1diphosphate (PRPP), LFW-252_1 is very likely to depend on a host, maybe aerobic or microaerophilic, although this is only based on the relative abundance patterns. Reactive oxygen species detoxification is mediated by superoxide reductase (SOR, neelaredoxin/desulfoferredoxin; TIGR00332) and rubredoxin. LFW-252_1 is a subsurface decomposer of complex carbohydrates based on the CAZymes associated to the degradation of lignocellulosic material, including carbohydrate esterases (CE1, CE12, and CE14), β-mannase (GH113), and even one vanillyl-alcohol oxidase (AA4) (Figure 2 and Supplementary Table 13). Its genome also harbours three different sialidases (GH33), at least two extracellular, making it one of the few non-pathogenic prokaryotes containing this enzyme (Buelow et al., 2016). In addition to the sialic acids as sources of C and N, LFW-252_1 also contains two extracellular serine peptidases (S08A, subtilisin-like), indicating a possible active role in the degradation of proteins in the environment (Supplementary Table 11). Energy production is based on the Embden-Meyerhof-Parnas (EMP) pathway, or the ferredoxin-mediated oxidative decarboxylation of pyruvate (PorABC), 2-oxoglutarate (KorAB), and, likely, other oxoacids (OforAB), i.e., fermentation of carbohydrates FIGURE 2 | Abundance of MEROPS subfamilies, CAZy families, and biogeochemically relevant proteins found in the MAGs from LFLS. MAGs are ordered according to the rp44 tree ( Figure 1). Counts of each protein are capped at 10 for easier visualization. Details about MEROPS, CAZy, and proteins of biogeochemical relevance, including exact counts and classes with predicted extracellular proteins can be found in Supplemenatry Tables 10, 11, 13, respectively. Plot created with ggtree v3.1.0 (Yu, 2020) and ggtreeExtra v1.0.4 (Xu et al., 2021). and amino acids. The EMP pathway, however, lacks the enzyme(s) required for the conversion of glyceraldehyde-3P (G3P) to 3-phosphoglycerate (3PG). Not a single subunit of the archaeal-type ATPase (A-ATPase), the electron transport chain (ETC) or the tricarboxylic acid (TCA) cycle were found. Pilus type IV (FlaIJK) but not archaellum is present.

LFWA-II: Order 'Ca. Norongarragalinales'
The lineage LFWA-II is represented in LFLS by MAGs LFW-144_1, and possibly LFW-156_1, and it is consistently placed as a sister clade to all other 'Ca. Micrarchaeia' (see taxonomic novelties; Figure 3 and Supplementary Figure 2). Both MAGs have a similar size but different completeness estimates and GC content: 93.5%C/1.1%R and 57.5% GC for LFW-144_1, and 90.3%C/8.6%R and 32.5% GC for LFW-156_1. Each of the LFWA-II MAGs belong to different families within o__UBA8480 (c__Micrarchaeia). 'Ca. Norongarragalina meridionalis' LFW-144_1, is the most complete genome within this order and co-generic with the only representative on the placeholder family f__0-14-0-20-59-11 in f__UBA93. LFW-156_1 is the sole representative of its own family (f__CABMDW01).
Functionally, LFW-144_1 is an auxotroph for most amino acids, displaying little capacity for amino acid interconversion (Supplementary Figure 4). Conversely, it carries a complete pentose-phosphate pathway and has the capacity for de novo biosynthesis of purine and pyrimidine nucleotides as well as nicotinate. Only the fructose-bisphosphate aldolase was undetected from the EMP pathway. While it lacks extracellular CAZymes (Figure 2), it has an extracellular metallopeptidase M23B (lysostaphin-like). M23 peptidases are often used to break bacterial cell walls as either a defence mechanism (bacteriocin) or for nutrition (e.g., predation or parasitism). Peroxide detoxification is mediated by rubrerythrin (PF02915), but it lacks SOR or superoxide dismutase (SOD). Energy production is derived from the fermentation of pyruvate (PorABC). Archaeal-type ATPase and type-IV pili are present. No ETC or TCA cycle components are present.
Despite the similar assembly size (<3 kbp difference) and similar %C compared to LFW-144_1, LFW-156_1 presents an even more limited genome, lacking complete de novo nucleotide biosynthesis, vitamin and cofactor biosynthetic pathways, TCA cycle, and pentose phosphate pathways. The EMP is nearly complete missing only the glucose-6-P isomerase. Conversely, it has three different extracellular glycoside hydrolases (GH74, GH57, and GH23) and one protease (S08A). Superoxide is decomposed via Fe/Mn SOD. It can ferment D-lactate (D-lactate dehydrogenase, LDH) but not pyruvate. Pilus type IV is present. Only four subunits of A-ATPase were detected (ABDI).

Anstonellales'
During the initial examination of the DPANN MAGs, a subset of 'Ca. Micrarchaeota' MAGs was differentiated from LFWA-II and 'Ca. Micrarchaeales' because of their biosynthetic capabilities. The lineage LFWA-III was the most diverse of all Archaea encountered at LFLS with 9 MAGs belonging to three different families and 9 different genera based on AAI values and phylogeny (Supplementary Tables 14, 15).
Further detailed phylogenetic analyses of DPANN, suggest though that LFWA-III is actually several similar but distinct orders (Figure 3). Two of the possible orders, LFWA-IIIa and LFWA-IIIb, are poorly represented taxa whose position on the tree is easily affected by the taxa included and substitution model (Supplementary Figures 5, 6). For example, in the dpann_r89F trees, LFWA-IIIa appears as sister of LFWA-IIIc (see below) and together constitute a sister lineage to LFWA-IIIb plus 'Ca. Fermentimicrarchaeales' (Supplementary Figure 6). However, in the concatenated and partitioned trees based on GTDB r202, LFWA-IIIa + LFWA-IIIb are sister to 'Ca.
Fermentimicrarchaeales, ' and these together sister to LFWA-IIIc. While based on the RED values from r202 would indicate LFWA-IIIa and LFWA-IIIb are likely to belong to the same order, their unstable position in respect to each other and to LFWA-IIIc and 'Ca. Fermentimicrarchaeales, ' they would be considered independent orders, as the GTDB r202 also does. In the case of LFWA-IIIc, it seems to be topologically consistent across all trees. LFWA-IIIc is equivalent to a pre-existing order in GTDB, o__UBA10214 (Figure 3).

Comparative functional analysis
In addition to their high abundance and diversity, LFWA-III is also an interesting lineage at a functional level, constituting one of the few DPANN with broad anabolic capabilities including the de novo biosynthesis of the amino acids lysine, arginine, and cysteine as well as purine and pyrimidine nucleotides (Figure 4). The metabolic capabilities of the top five LFWA-III genomes (the four named species plus LFW-29) were compared with five of the better characterised and/or complete DPANN genomes (Figure 4): 'Ca. Iainarchaeum andersonii' (Youssef et al., 2015) and AR10 (Castelle et al., 2015) ('Ca. Diapherotrites'), 'Ca. Micrarchaeum acidiphilum' ARMAN-2 (Baker et al., 2006) and 'Ca. Mancarchaeum acidiphilum' Mia14 (Golyshina et al., 2017) ('Ca. Micrarchaeota'), and "Nanoarchaeum equitans" Kin4-M (Huber et al., 2002;Waters et al., 2003) ("Nanoarchaeota"). In general, both reference and LFWA-III MAGs have type-IV pili, A-ATPases (except "Nanoarchaeum equitans") and lack ETC and TCA cycle components (except ARMAN-2). In terms of core metabolism, the representative genomes of LFWA-III share more similarities with 'Ca. Diapherotrites' than with the reference 'Ca. Micrarchaeota, ' e.g., de novo biosynthesis pathways for nucleotides, EMP pathway, de novo biosynthesis of nicotinate/nicotinamide or aromatic amino acids (Figure 4). The EMP pathway is also shared with other DPANN not shown in Figure 4, e.g., LFWA-II, LFWA-IV and 'Ca. Fermentimicrarchaeum limneticum, ' while the de novo biosynthesis pathways for nucleotides are also shared with the latter. However, LFWA-III also contains other unique features: pathways for the biosynthesis of L-arginine, L-lysine, L-cysteine, and L-histidine (only in AR10 in the reference genomes). LFW-121_3 and LFW-281_7 display most of the genes encoding the pathway for the biosynthesis of branched amino acids. Conversely, in the case of LFWA-IIIc, and likely also with 'Ca. Iainarchaeum andersonii, ' would likely rely on the interconversion of L-valine and L-leucine. LFW-121_3 in particular presents unique features that either distinguish it from other LFWA-III genomes and/or the reference genomes. Both LFW-121_3 and 'Ca. Iainarchaeum andersonii' are the only MAGs able to convert L-aspartate to L-threonine based upon current evidence.
Another interesting difference relates to the thiamine acquisition (Figure 4). Both Diapherotrites genomes have transporters for thiamine and/or its phosphorylated forms but lack any enzyme known in the de novo pathway for its biosynthesis. Instead, LFW-121_3 and LFW-281_7, as well as 'Ca. Fermentimicrarchaeum limneticum, ' are likely to be able to produce thiamine de novo provided a source for the thiazole branch of the thiamine pathway is available (see Supplementary Information for further discussion). None of the LFWA-IIIc MAGs have the biosynthetic pathway nor specific transporters for thiamine.
Like other DPANN, most LFWA-III are fermentative organisms. However, there are differences in the substrates used. Leaving aside LFW-281_7, for which we couldn't detect any of the proteins mentioned next, all other LFWA-III and ARMAN-2 can ferment 2-oxoglutarate (KorAB), and other 2oxoacids (OforAB). Pyruvate fermentation was detected in both Diapherotrites and ARMAN-2 but not in any LFWA-III. 'Ca. Anstonellaceae' MAGs (LFW-35, LFW-29) were the only ones with LDH, a differential characteristic from the other LFWA-III, but shared with 'Ca. Iainarchaeum andersonii.' LFW-121_3 was the only one with a predicted capacity for the fermentation of aromatic amino acids amongst the 10 genomes compared (IorAB). LFW-281_7 also lacks the [NiFe] group 3b hydrogenase present in the other LFWA-III and AR10. The [NiFe] group 3b hydrogenases are different from the [NiFe] group 4e found in 'Ca. Fermentimicrarchaeum limneticum' (Kadnikov et al., 2020). Both groups of hydrogenases are considered to be bidirectional under the proper conditions but they have some key differences. While the [NiFe] group 3b are cytosolic, O 2 -tolerant and couple the oxidation of NADPH to fermentative evolution of H 2 , the [NiFe] group 4e are membrane-bound, O 2 -sensitive and often associated to respiratory complexes (Søndergaard et al., 2016). Based on the lack of fermentative proteins and the associated hydrogenase, LFW-281_7 is likely the only non-fermenting member of LFWA-III.

LFWA-IV: Another Member of an Obscure DPANN Lineage
The MAG LFW-46 is the sole representative of the LFWA-IV lineage (Figure 1). While presenting good%C/%R levels for a functional description (92.5%C, 3.2%R, 43.8% GC), no rRNA genes aside from a partial 5S were recovered even when searching the assembly graph directly (thus no name is proposed). All the DPANN trees place LFW-46 as part of p__EX4484-52, sister to 'Ca. Nanohaloarchaeota' (Supplementary Figure 2). The LFW-46 genome is a typical example of a reduced, host-dependent DPANN (Supplementary Figure 7) with very limited amino acid interconversion pathways, lacking both the de novo nucleotide biosynthesis, the TCA cycle and the pentose phosphate pathways. It also lacks biosynthetic pathways for the production of vitamins and cofactors. However, LFW-46 differentiates itself by the relatively rich repertoire of transporters related to resistance, especially in comparison with the repertoire of other DPANN: multidrug, tetracycline, quinolones, arsenite, Cu + and Cu 2+ , and mono-/di-valent organocations (e.g., quaternary ammonium compounds). It relies on the EMP pathway, but the enzyme(s) for the conversion of glyceraldehyde-3P to 3-phosphoglycerate could not be detected by KofamScan. However, arNOG annotation identified GapN (arCOG01252) related to the glyceraldehyde-3-phosphate dehydrogenase [NAD(P) +] from Vulcanisaeta distributa (seed e-Value: 6.7·10 −101 ). The extracellular enzymes of LFW-46 include one protease (S08A), one glycoside hydrolase (GH57), and one polysaccharide lyase (PL9). ROS detoxification is mediated by neelaredoxin/desulfoferredoxin and rubrerythrin. LFW-46 ferments pyruvate for energy (PorABCD) and has no ETC components. Archaeal-type ATPase and type-IV pili are both present.

Multivariate Comparison of DPANN Genomes
Sparse Principal Component Analyses (sPCA) of the functional (e.g., COG, and MEROPS annotations) and/or compositional (amino acid composition of predicted proteins, and GC%) profiles of selected Archaea and DPANN genomes (only ≥ 75%C based on CheckM) were performed in order to reveal the overall distinctiveness of LFWA lineages. In the all-Archaea dataset with compositional + functional data, LFWA-III clustered closer to other DPANN genomes (Supplementary Figure 8) yet positioned near to what could be referred to as a 'transition zone' between DPANN and normal-sized archaeal genomes (e.g., Euryarchaeota and TACK). Compositional and functional analysis provided an insight into the uniqueness of LFWA-III: displaying high GC% (usually ∼50% or greater, Supplementary Figure 9) and an overall above-average number of COG-annotated proteins.
Functional sPCA of the 140 DPANN genomes enabled differentiation of the metabolically rich (e.g., LFWA-II, LFWA-III, 'Ca. Diapherotrites' and related) and acidophilic lineages from the other DPANN genomes (Figures 5A,B). This differentiation of LFWA-III was mainly shown in PC1 (Figure 5A; COG categories L, O, U, W) and PC3 (not shown; COG E, F, H and MEROPS C).
The high positive loadings in the compositional dataset along PC1 for Gly, Ala, and GC, and negative loadings for Ile and Asn may be due to the high GC content of most LFWA-III genomes ( Figure 5B and Table 2), considering that those amino acids are encoded by GC-rich and AT-rich codons respectively (Grosjean and Westhof, 2016). Very high positive values along PC2 separated the acidophilic DPANN related to the acidophilic 'Ca. Micrarchaeota' and 'Ca. Parvarchaeota' from the other DPANN.

Diverse and Abundant 'Ca. Methanoperedens spp.'
The occurrence of three methanogenic MAGs from the "Methanotrichaceae" family (previously indicated to be the illegitimate Methanosaetaceae, Vázquez-Campos et al., 2017) were principally and, unsurprisingly, detected during the highly anoxic phase (117-147 mV Standard Hydrogen Electrode). More interestingly though, was the recovery of MAGs belonging to the genus 'Ca. Methanoperedens.' A total of six MAGs were assembled in this study, 4 ≥ 95%C (all < 10%R). Each of these MAGs belong to different species based on ANI and AAI scores (Figure 6 and Supplementary Tables 16, 17), constituting the most diverse set of 'Ca. Methanoperedens spp.' genomes reconstructed to date in a single study and site, and all representing different species to previously sequenced MAGs.

Multiple Acquisition of Respiratory Molybdopterin Oxidoreductases
Most known 'Ca. Methanoperedens spp.' encode some molybdopterin oxidoreductase enzyme, which are usually referred to as NarG. All the 'Ca. Methanoperedens spp.' found at LFLS share a putative molybdopterin oxidoreductase that is not present in other members of the genus (Figure 7). Notably, this is different from any canonical or putative NarG (as is often suggested for other 'Ca. Methanoperedens spp.') with no match to HMM profiles characteristic of these proteins but, instead,  belonging to arCOG01497/TIGR03479 (DMSO reductase family type II enzyme, molybdopterin subunit).

High Heme Cytochromes Are Common in 'Ca. Methanoperedens spp.' Genomes
Metal respiration is usually associated with cytochromes with a high number of heme-binding sites (≥10) or high heme cytochromes (HHC from here on). The screening of the 'Ca. Methanoperedens spp.' MAGs from LFLS revealed multiple HHC in all genomes, with a minimum of six copies for LFW-280_1_1 (68.5%C, Figure 2 and Supplementary Table 10). All reference 'Ca. Methanoperedens spp.' genomes revealed similar abundances when considering their completeness, i.e., ∼10 HHC per genome. While most HHC gene clusters are singletons (48/73 gene clusters), there is one highly conserved gene cluster in the core genome (GC_00000082, median identity 75.5%) of 'Ca. Methanoperedens' containing 13 heme-binding sites. Another HHC that might be shared across all 'Ca. Methanoperedens' is the GC_00001534 with 11 heme binding sites, although it is missing from the less complete LFLS MAGs and further confirmation would be needed to assess if it could be part of the core genome and not an artefact due to low completeness. Although metal reduction has only been confirmed for a limited portion of the known 'Ca. Methanoperedens' that exist: 'Ca. Methanoperedens ferrireducens, ' 'Ca. Methanoperedens sp.' BLZ1, 'Ca. Methanoperedens manganireducens' and 'Ca. Methanoperedens manganicus' (Ettwig et al., 2016;Cai et al., 2018;Leu et al., 2020), our data suggests metal reduction might be a universal characteristic of 'Ca. Methanoperedens spp.'
In 'Ca. Methanoperedens spp., ' the genes associated with the biosynthesis of PHA are usually present as an operon (Supplementary Figure 10A). Although not explicitly reported in their respective manuscripts, PhaC (PHA synthase type III heterodimeric, TIGR01836) and PhaE (PHA synthase subunit E, PF09712) were predicted in all the 'Ca. Methanoperedens' reference genomes included in this study (Haroon et al., 2013;Arshad et al., 2015;Vaksmaa et al., 2017;Cai et al., 2018;Leu et al., 2020). However, while all PhaC proteins form a gene cluster in the pangenome analysis, PhaE does not (Figure 6). This could be related to their function: PhaC is a catalytic protein while PhaE 'simply' regulates PhaC (Koller, 2019), so any sequence modifications in PhaC might have a more critical effect on the biosynthesis of PHA. Both proteins, PhaC and PhaE, have also been predicted in three of the most complete new genomes in the present study (LFW-24, LFW-280_3_1 and LFWA-280_3_2). In most instances, PHA biosynthesis genes have been found in putative operons (Supplementary Figure 9A).

DISCUSSION
In this study we describe 23 DPANN Archaea candidate taxa (species to order), as well as proposing four supraspecific taxa to accommodate extant and newly proposed lineages according to the current taxonomy; proposed Candidatus names have been proposed (family to class), following the Linnaean system.
The archaeal community at LFLS, showed a clear dominance of DPANN and 'Ca. Methanoperedens spp., ' with the data sets' uniqueness highlighted by the unmatched MAGs (at species level) with any extant reference genomes. Amongst the DPANN MAGs recovered, a number of them constitute the first detailed description of lineages with placeholder names from large scale projects (Parks et al., 2017Rinke et al., 2020).

Self-Sustaining DPANN -Possibly More Common Than Previously Thought?
The DPANN archaea sensu stricto (i.e., not including 'Ca. Altiarchaeota') are often described as organisms with extremely reduced metabolic capacities and generally, with few exceptions, dependent on a host for the acquisition of essential metabolites such as vitamins, amino acids or even reducing equivalents (Dombrowski et al., 2019). During the metabolic reconstruction of individual archaeal genomes, it was observed that select DPANN from the LFWA-III group, exemplified by LFW-121_3 ('Ca. Gugararchaeum adminiculabundum'), possess nearcomplete pathways for the synthesis of amino acids, purine and pyrimidine nucleotides, riboflavin, and thiamine (Figure 4). These predictions, derived from either Pathway Tools (via Prokka annotations) or KEGG, were unusual given that when a DPANN genome lacks the biosynthetic capacity for an amino acid, all enzymes for that pathway are typically missing. However, several of the pathways in the 'Ca. Gugararchaeales, ' 'Ca. Burarchaeales' and 'Ca. Anstonellales' MAGs had a limited number of gaps (Figure 4). These alleged 'gaps, ' thoroughly discussed in the Supplementary Information with focus on LFW-121_3, were often related to unusual enzymes poorly annotated in databases, or steps known to be possibly performed by bifunctional or promiscuous enzymes. Manual curation with predicted functions from other databases combined with literature searches was able to fill some of those gaps or, at least, provide reasonable candidate proteins that may perform those functions.
The discovery of additional DPANN taxa with a "rich" metabolism, at least compared with host-dependant DPANN (e.g., "Nanoarchaeum," 'Ca. Nanopusillus, ' Mia14), highlights the metabolic diversity and widespread existence of free-living DPANN Archaea, at least within the Diapherotrites/Micrarchaeota lineage -DM branch or cluster 1 (Dombrowski et al., 2020) -as opposed to the PANN branch, suggesting a more generalised genome streamlining process/loss of function in the latter. Prior studies have remarked upon the generalised "limited metabolic capacities" in most DPANN (Dombrowski et al., 2019). However, many DPANN datasets are heavily enriched with lineages that are likely to be hostdependent, e.g., 'Ca. Pacearchaeota' or 'Ca. Woesearchaeota' (suggested to be orders in the GTDB), or derived from acidic environments, generally restricted to ARMAN lineages, esp. 'Ca. Micrarchaeaceae.' This study expands the phylum 'Ca. Micrarchaeota' with 3 orders, 5 families, 5 genera, and 5 species (Figure 3).
Several studies have reported that ultra-small Bacteria and Archaea can be lost using standard 0.22 µm filters (Luef et al., 2015;Chen et al., 2018). However, this may only affect individual cells, while host-DPANN cell associations would be retained and overrepresented compared to free-living DPANN, which are not necessarily larger than the host-associated organisms. The high abundance of the putative free-living LFWA-III in the metagenomic samples from LFLS could be related to the elevated concentrations of labile colloidal Fe that precipitates at the redox interface following the infiltration of oxic-rainwaters. Acting in a similar manner as a flocculant in water treatment, iron (oxyhydr)oxide microaggregates have immense capacity to trap organic matter and microbial cells (Tang et al., 2016). This process has been used in protocols to recover and concentrate viruses from environmental samples without the need of ultrafiltration (John et al., 2011). As such, the colloidal Fe would also have improved the recovery of free-living DPANNs and any other ultra-small microorganisms that, by default would have been otherwise lost during the filtration with the 0.22 µm filters used during sampling (Vázquez-Campos et al., 2017).
Several of the NarG related to NarG LFLS (from Halobacteria) have been confirmed to have certain promiscuity (Figure 7) and are known to be able to use (per)chlorate as a substrate in vivo (Yoshimatsu et al., 2000;Oren et al., 2014). Although it is close to impossible to predict the actual substrates of the NarG LFLS (or PcrA LFLS ), the phylogenetic analysis is, at least, suggestive of the potential utilisation of (per)chlorate. The oxidation of methane linked to the bioreduction of perchlorate has been proposed a number of times in the past Xie et al., 2018;Wu et al., 2019). Recent works Wu et al., 2019) have suggested the possible involvement of ANME archaea in (per)chlorate reduction based on chemical and amplicon data analyses, an assumption considered more likely given that nitrate was rarely measured in noticeable concentrations in the LFLS sampling trench (Vázquez-Campos et al., 2017). Indeed, nitrate concentrations were below the detection limit (<0.01 µM) before any 'Ca. Methanoperedens spp.' became relatively abundant. With LFLS historical disposal records indicating that quantities of perchlorate/perchloric acid were deposited in the vicinity of the sample location, it collectively suggests that 'Ca. Methanoperedens spp.' utilisation of nitrate is unlikely at the site. However, as no chlorite dismutase gene was found in any of the MAGs, the possibility that they may utilise perchlorate is also not well supported. Given the high iron concentrations, it is reasonable to assume that the AOM at LFLS mainly use Fe(III) electron acceptor due to its ready availability, although other redox-active metals detected in the trenches, e.g., Mn, might also be used (Vázquez-Campos et al., 2017).
The collective evidence indicates that nitrate reductase(-like) enzymes are rather widespread in 'Ca. Methanoperedens spp., ' with the exception of 'Ca. Methanoperedens ferrireducens' from which no nitrate reductase candidate could be detected, aside from an orphan NapA-like protein (cd02754). Phylogenetic analysis of the NarG and similar proteins (Figure 7) suggests that these proteins might have been horizontally acquired at least three different times during the evolution of 'Ca. Methanoperedens.' Congruently, the pangenomic analysis showed three different NarG variants in their respective gene clusters (Figures 6, 7).

High-Heme Cytochromes
Analysis of historical disposal records revealed that over 760 (intact and partially corroded) steel drums were deposited within the legacy trenches at LFLS, including in close proximity of the collected samples (Payne, 2012). The unabated redox oscillations which have occurred in the trenches over the last 60+ years and the resultant impact upon the steel drums, have likely contributed to the elevated concentrations of soluble iron observed (∼0.5-1 mM) (Vázquez-Campos et al., 2017).
Utilisation of metals (e.g., Fe, Mn, Cr, and U) as electron acceptors requires the presence of multi-heme cytochromes. Although multi-heme cytochromes are not intrinsically and exclusively employed for heavy metal reduction (e.g., decaheme DmsE for DMSO reduction), HHC are characteristic of microorganisms performing biological reduction of metals. In our case, HHC were found in all new and reference 'Ca. Methanoperedens, ' including a gene cluster from the core genome. This may well indicate that oxidised metal species were the ancestral electron acceptor to all 'Ca. Methanoperedens, ' rather than nitrate or other non-metallic oxyanion moieties.
Previous studies indicate that 'Ca. Methanoperedens sp.' BLZ1 and 'Ca. Methanoperedens ferrireducens' can both use the Fe(III) (oxyhydr)oxide ferrihydrite as electron acceptor for AOM (Ettwig et al., 2016;Cai et al., 2018). Reports showing the involvement of more crystalline forms of iron oxides for the AOM are scarce with the possibility that these organisms may drive a cryptic sulfate reduction cycle rather than a more direct Fe(III)-dependent AOM (Sivan et al., 2014). Based on the evidence to hand, it is not clear whether 'Ca. Methanoperedenaceae' or other ANME could be responsible for these observations.
Tantalising questions remain as to the role of 'Ca. Methanoperedens spp.' with regard to the ultimate fate of iron within the legacy trenches; an important consideration given the central role that iron plays in the mobilisation/retardation of key contaminants plutonium and americium (Ikeda-Ohno et al., 2014;Vázquez-Campos et al., 2017).

Polyhydroxyalkanoates Biosynthesis
Variable nutrient levels can result in the microbial production of polyhydroxyalkanoates (PHA), which are carbon-rich, energetic polymers resulting from the polymerisation of hydroxyalkanoates (Reddy et al., 2003;Martínez-Gutiérrez et al., 2018). The biosynthesis of PHA is a widespread characteristic in many groups of aerobic Bacteria (Reddy et al., 2003;Martínez-Gutiérrez et al., 2018). However, very few strict anaerobes are able to synthesise PHA, being mostly limited to syntrophic bacteria (Spang et al., 2012). In Archaea, the biosynthesis of PHA is known to occur, particularly in many Nitrososphaerales ("Thaumarchaeota") (Spang et al., 2012) and Euryarchaeota, where it has been traditionally limited to Halobacteria (Koller, 2019).
Key genes, as well as full operons, have been detected in several 'Ca. Methanoperedens' genomes, with recent experimental evidence identifying the production of PHA by 'Ca. Methanoperedens nitroreducens' (Cai et al., 2019). This suggests that the biosynthesis of PHA could be a widespread feature in 'Ca. Methanoperedens spp.' or even in all 'Ca. Methanoperedenaceae, ' indicating a possible, generalised, role in the accumulation of excess carbon at times when the carbon source (methane) is much more abundant than other nutrients and/or trace elements (Karthikeyan et al., 2015).
It has been estimated that anaerobic methane oxidisers may consume up to 80-90% of the methane produced in certain environments, mitigating its release to the atmosphere (Reeburgh, 2007). The capacity of 'Ca. Methanoperedens spp.' to accumulate PHA might have implications for the further refinement of these estimates. The inference being that methane would not solely be used for energy production or to increase cell numbers, and that 'Ca. Methanoperedens spp.' could act as a 'carbon-capture' device, especially within carbon-rich anoxic environments whenever they are the main anaerobic methane oxidisers. This would likely be the case for the LFLS test trench, where ammonium, nitrate/nitrite and total dissolved nitrogen are limiting.

CONCLUSION
While the Archaea inhabiting the LFLS trench water constitute a relatively small portion of its microbial community, they have important roles in biogeochemical cycling, especially with respect to methanogenesis, anaerobic methane oxidation and Fe(III) reduction. The diverse 'Ca. Methanoperedens spp.' are of special relevance due to their role in methane capture (and subsequent conversion to PHA), Fe cycling, and dissimilatory pathways for nitrate and, possibly, (per)chlorate reduction.
The broad phylogenetic representation of DPANN organisms, which have attracted particular attention in recent years, either for their unusual characteristics or their controversial evolutionary history, is another interesting feature of the LFLS archaeal community. The description and evaluation of the MAGs from lineages LFWA-I to IV constitutes a detailed first examination of several undescribed archaeal lineages.
The proposed LFWA-III lineages, 'Ca. Gugararchaeales, ' 'Ca. Burarchaeales, ' and 'Ca. Anstonellales' are amongst the most interesting. Foremost, they are together unusually diverse at LFLS, with 9 out of the 37 MAGs recovered in this study belonging to the 3 orders, with 4 families and 4 genera (and species) newly proposed within them. Furthermore, they have a cohesive central metabolism that likely spans the entire order with full pathways for the de novo biosynthesis of nucleotides, most amino acids, and several vitamins/cofactors. This is duly reflected in the proposed name 'Ca. Gugararchaeum adminiculabundum, ' proposed type for LFWA-IIIa, where the specific epithet translates as "self-supporting" relating to the possibility of not needing a symbiotic partner.

TAXONOMIC APPENDIX
Due to several irregularities in the process of naming candidate lineages, the European Nucleotide Archive does not allow certain taxonomic names at taxonomic levels where they should exist. For example, the phylum 'Ca. Micrarchaeota, ' and its defining candidate species 'Ca. Micrarchaeum acidophilum, ' have no defined nomenclature at class, order, or family levels. This and other issues related to the nomenclature of uncultivated Bacteria and Archaea have been previously discussed in the literature (Whitman, 2016;Konstantinidis et al., 2017Konstantinidis et al., , 2019Oren and Garrity, 2018;Chuvochina et al., 2019;Rosselló-Móra and Whitman, 2019) and we will not provide further discussion on this matter.
Nonetheless, in order to cover these gaps, we feel obliged to suggest several of these intermediate nomenclatural levels, many of which are already covered in the GTDB but not proposed or described in the literature, in addition to the nomenclatural novelties intrinsic to this manuscript.
It should be noted that the use of genome sequences as type material is currently being considered to be included into the International Code on the Nomenclature of Prokaryotes Described from Sequence Data (ICNPDSD, "SeqCode") and is not accepted by ICNP (International Code of the Nomenclature of Prokaryotes).
The type material is the metagenome assembled genome (MAG) LFW-252_1 (ERS2655302) recovered from the groundwater of the Little Forest Legacy Site (NSW, Australia). The MAG consists of 1.16 Mbp in 56 contigs with an estimated completeness of 95.7%, redundancy of 1.1%, 16S, 23S, and 5S rRNA gene, and 21 tRNAs. The GC content of this MAG is 36.5%.
The type material appears in GTDB r202 (Parks et al., 2017) as reference for s__CABMEV01 sp902385255.
The family 'Candidatus Tiddalikarchaeaceae' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Tiddalikarchaeum. ' The family is equivalent to f__CG07-land in GTDB r89/r202 (Parks et al., 2017).
The order 'Candidatus Tiddalikarchaeales' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Tiddalikarchaeum.' The order is equivalent to CG07-land from Probst et al. (2018) or o__CG07-land in GTDB r89/r202 (Parks et al., 2017).
This class is the only class within "Nanoarchaeota" Huber and Kreuter (2014), and equivalent to the c__Nanoarchaeia in the GTDB r89/r202 (Parks et al., 2017).

'Candidatus
Gugararchaeum adminiculabundum' (ad.mi.ni.cu.la.bun'dum. L. adj. neut. self-supporting; in reference to the limited external requirements and its suggested independence from a host/symbiote, due to the predicted presence of pathways for the biosynthesis of amino acids, purines, pyrimidines, thiamine, and riboflavin).
The type material is the metagenome assembled genome (MAG) LFW-121_3 (ERS2655302) recovered from the groundwater of the Little Forest Legacy Site (NSW, Australia). The MAG consists of 1.47 Mbp in 76 contigs with an estimated completeness of 96.8%, redundancy of 2.2%, 16S, 23S, and 5S rRNA gene, and 21 tRNAs. The GC content of this MAG is 49.8%.
The type material appears in GTDB r202 (Parks et al., 2017) as reference for s__CABMDC01 sp902384795.
The family 'Candidatus Gugararchaeaceae' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Gugararchaeum. ' The family appears in GTDB r202 (Parks et al., 2017) as f__CABMDC01.
The order 'Candidatus Gugararchaeales' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Gugararchaeum. ' The order is equivalent to LFWA-IIIa in this manuscript and appears as o__CABMDC01 in GTDB r202 (Parks et al., 2017).
The type material is the metagenome assembled genome (MAG) LFW-281_7 (ERS2655318) recovered from the groundwater of the Little Forest Legacy Site (NSW, Australia). The MAG consists of 1.20 Mbp in 76 contigs with an estimated completeness of 96.8%, redundancy of 5.4%, 16S and 5S rRNA gene, and 18 tRNAs. The GC content of this MAG is 57.6%.
The type material appears in GTDB r202 (Parks et al., 2017) as reference for s__CABMJK01 sp902386535.
The family 'Candidatus Burarchaeaceae' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Burarchaeum. ' The family is equivalent to f__B9-G16 in GTDB r202 (Parks et al., 2017).
The order 'Candidatus Burarchaeales' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Burarchaeum. ' The order is equivalent to LFWA-IIIc in this manuscript and o__B9-G16 in GTDB r202 (Parks et al., 2017).
The type material is the metagenome assembled genome (MAG) LFW-35 (ERS2655287) recovered from the groundwater of the Little Forest Legacy Site (NSW, Australia). The MAG consists of 1.33 Mbp in 68 contigs with an estimated completeness of 97.8%, redundancy of 2.2%, 16S, 23S and 5S rRNA gene, and 21 tRNAs. The GC content of this MAG is 50.3%.
The type material appears in GTDB r202 (Parks et al., 2017) as reference for s__CABMCJ01 sp902384585.
The family 'Candidatus Anstonellaceae' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Anstonella.' This family is equivalent to f__UBA10161 in the GTDB r89/r202 (Parks et al., 2017).
'Candidatus Bilamarchaeum dharawalense' (dha.ra.wa.len'se. N.L. neut. adj. pertaining to the Dharawal, traditional owners of the lands where the Little Forest Legacy Site is located).
The type material is the metagenome assembled genome (MAG) LFW-283_2 (ERS2655319) recovered from the groundwater of the Little Forest Legacy Site (NSW, Australia). The MAG consists of 1.27 Mbp in 63 contigs with an estimated completeness of 95.7%, redundancy of 0%, 16S, 23S and 5S rRNA gene, and 20 tRNAs. The GC content of this MAG is 40.0%.
The type material appears in GTDB r202 (Parks et al., 2017) as reference for s__CAILMU01 sp902386555.
The family 'Candidatus Bilamarchaeaceae' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Bilamarchaeum.' This family is equivalent to f__UBA10214 in the GTDB r89/r202 (Parks et al., 2017).
The order 'Candidatus Anstonellales' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). It constitutes the most inclusive clade that includes the families 'Candidatus Anstonellaceae, ' and 'Candidatus Bilamarchaeaceae.' The type genus is 'Candidatus Anstonella. ' The order is equivalent to the lineage LFWA-IIIc in this manuscript, and o__UBA10214 in the GTDB r89/r202 (Parks et al., 2017).
The family 'Candidatus Micrarchaeaceae' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). It is the most inclusive family that includes the genera 'Candidatus Micrarchaeum' and 'Candidatus Mancarchaeum.' The type genus is 'Candidatus Micrarchaeum.' This family is equivalent to f__Micrarchaeaceae in the GTDB r89/r202 (Parks et al., 2017).
The order 'Candidatus Micrarchaeales' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The type genus is 'Candidatus Micrarchaeum. ' The order is equivalent to o__Micrarchaeales in the GTDB r89/r202 (Parks et al., 2017).
The type material is the metagenome assembled genome (MAG) LFW-144_1 (ERS2655293) recovered from the groundwater of the Little Forest Legacy Site (NSW, Australia). The MAG consists of 0.93 Mbp in 49 contigs with an estimated completeness of 93.5%, redundancy of 1.1%, 16S and 5S rRNA gene, and 20 tRNAs. The GC content of this MAG is 57.5%.
The family 'Candidatus Norongarragalinaceae' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Norongarragalina.' This family is equivalent to f__0-14-0-20-59-11 in the GTDB r89/r202 (Parks et al., 2017).
The order 'Candidatus Norongarragalinales' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). The description is the same as that of its sole genus and species. The type genus is 'Candidatus Norongarragalina.' This order is equivalent to LFWA-II in this manuscript, and o__UBA8480 in GTDB r89/r202 (Parks et al., 2017).
The class 'Candidatus Micrarchaeia' is circumscribed based on two independent concatenated protein phylogenies of 122 and 93 markers, and supported by the rank normalisation approach as per Parks et al. (2018). This class is defined as the most inclusive class that includes the orders 'Candidatus Micrarchaeales, ' 'Candidatus Gugararchaeales, ' 'Candidatus Anstonellales, ' and 'Candidatus Norongarragalinales.' The type genus is 'Candidatus Micrarchaeum.' This class constitutes the only class within the phylum 'Candidatus Micrarchaeota ' Baker and Dick (2013). It is equivalent to the c__Micrarchaeia in the GTDB r89/r202 (Parks et al., 2017).

DATA AVAILABILITY STATEMENT
Annotated assemblies are available at ENA under project PRJEB21808 (https://www.ebi.ac.uk/ena/browser/view/ PRJEB21808) and sample identifiers ERS2655284-ERS2655320. See details in Supplementary Table 7. The output of the different analyses, including the reference genomes, are available at Zenodo (doi: 10.5281/zenodo.3365725).

AUTHOR CONTRIBUTIONS
XV-C conceptualized the study, carried out the data curation and formal analysis, investigated and visualized the data, performed the methodology, carried out the project administration, wrote the original draft, and wrote, reviewed, and edited the manuscript. ASK wrote the original draft and wrote, reviewed, and edited the manuscript. MWB wrote, reviewed, and edited the manuscript. MRW carried out the resource procurement, supervised the study, and wrote, reviewed, and edited the manuscript. TEP carried out the resource procurement, funding acquisition, and project administration and provided the sitespecific expertise. TDW carried out the resource procurement, funding acquisition, and project administration and supervision, and wrote, reviewed, and edited the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
MRW acknowledges the support of the Australian Federal Government NCRIS scheme, via Bioplatforms Australia. MRW and XV-C acknowledge support from the New South Wales State Government RAAP scheme and the UNSW RIS scheme. ASK, TEP, and TDW acknowledge funding by the Australian Research Council (Discovery Project DP210103727) as well as from the Australian Nuclear Science and Technology Organisation (ANSTO).